整个幻灯片图像(WSI)分类是诊断和治疗疾病的基本任务;但是,精确标签的策划是耗时的,并限制了完全监督的方法的应用。为了解决这个问题,多个实例学习(MIL)是一种流行的方法,它仅使用幻灯片级标签作为一个弱监督的学习任务。尽管当前的MIL方法将注意机制的变体应用于具有更强模型的重量实例特征,但注意力不足是对数据分布的属性的不足。在这项工作中,我们建议通过使用Max-Instance(关键)功能的统计数据来重新校准WSI袋(实例)的分布。我们假设在二进制MIL中,正面袋的特征幅度大于负面,因此我们可以强制执行该模型,以最大程度地利用公制特征损失的袋子之间的差异,该袋子将正面袋模型为未分布。为了实现这一目标,与使用单批训练模式的现有MIL方法不同,我们建议平衡批次采样以有效地使用功能丢失,即同时(+/-)袋子。此外,我们采用编码模块(PEM)的位置来建模空间/形态信息,并通过变压器编码器通过多头自我注意(PSMA)进行汇总。现有基准数据集的实验结果表明我们的方法是有效的,并且对最先进的MIL方法有所改善。
translated by 谷歌翻译
近年来,目睹了互联网上的科学阴谋视频,科学认识论和公众对科学的认识。学者们已经开始研究阴谋消息中使用的说服技术,例如不确定和恐惧,尤其是视觉叙述,特别是视觉叙述在视频中如何与传播阴谋传播的人相差。本文通过使用计算方法分析数百万帧,通过分析数百万帧来了解阴谋视频中的视觉框架,解决了这种差距。我们发现阴谋视频倾向于使用较低的颜色方差和亮度,尤其是在视频的缩略图和早期部分。本文还展示了研究人员如何在机器学习模型中集成文本和视觉特征,以研究社交媒体的阴谋,并探讨有兴趣在数字时代进行视觉操纵的学者计算建模的影响。本文呈现的视觉和文本特征的分析对于未来的研究专注于设计系统来识别互联网上的阴谋内容。
translated by 谷歌翻译
在本文中,我们提出了Sanane-TTS,这是一种稳定且自然的端到端多语言TTS模型。由于很难为给定的演讲者获得多语言语料库,因此不可避免地会使用单语语料库进行多语言TTS模型。我们介绍了扬声器正规化损失,该损失可改善跨语性合成期间的语音自然性以及域对抗训练,该训练适用于其他多语言TTS模型。此外,通过添加扬声器正规化损失,以持续时间为零矢量嵌入的扬声器可以稳定跨语性推断。通过此替代品,我们的模型将产生以中等节奏的语音,而不论跨语性合成中的源说话者如何。在MOS评估中,Sane-TTS在跨语义和内部合成中的自然性得分高于3.80,地面真相评分为3.99。同样,即使在跨语性的推论中,Sane-TTS也保持了接近地面真理的说话者相似性。音频样本可在我们的网页上找到。
translated by 谷歌翻译
According to the rapid development of drone technologies, drones are widely used in many applications including military domains. In this paper, a novel situation-aware DRL- based autonomous nonlinear drone mobility control algorithm in cyber-physical loitering munition applications. On the battlefield, the design of DRL-based autonomous control algorithm is not straightforward because real-world data gathering is generally not available. Therefore, the approach in this paper is that cyber-physical virtual environment is constructed with Unity environment. Based on the virtual cyber-physical battlefield scenarios, a DRL-based automated nonlinear drone mobility control algorithm can be designed, evaluated, and visualized. Moreover, many obstacles exist which is harmful for linear trajectory control in real-world battlefield scenarios. Thus, our proposed autonomous nonlinear drone mobility control algorithm utilizes situation-aware components those are implemented with a Raycast function in Unity virtual scenarios. Based on the gathered situation-aware information, the drone can autonomously and nonlinearly adjust its trajectory during flight. Therefore, this approach is obviously beneficial for avoiding obstacles in obstacle-deployed battlefields. Our visualization-based performance evaluation shows that the proposed algorithm is superior from the other linear mobility control algorithms.
translated by 谷歌翻译
Supervision for metric learning has long been given in the form of equivalence between human-labeled classes. Although this type of supervision has been a basis of metric learning for decades, we argue that it hinders further advances of the field. In this regard, we propose a new regularization method, dubbed HIER, to discover the latent semantic hierarchy of training data, and to deploy the hierarchy to provide richer and more fine-grained supervision than inter-class separability induced by common metric learning losses. HIER achieved this goal with no annotation for the semantic hierarchy but by learning hierarchical proxies in hyperbolic spaces. The hierarchical proxies are learnable parameters, and each of them is trained to serve as an ancestor of a group of data or other proxies to approximate the semantic hierarchy among them. HIER deals with the proxies along with data in hyperbolic space since geometric properties of the space are well-suited to represent their hierarchical structure. The efficacy of HIER was evaluated on four standard benchmarks, where it consistently improved performance of conventional methods when integrated with them, and consequently achieved the best records, surpassing even the existing hyperbolic metric learning technique, in almost all settings.
translated by 谷歌翻译
Steering language generation towards objectives or away from undesired content has been a long-standing goal in utilizing language models (LM). Recent work has demonstrated reinforcement learning and weighted decoding as effective approaches to achieve a higher level of language control and quality with pros and cons. In this work, we propose a novel critic decoding method for controlled language generation (CriticControl) that combines the strengths of reinforcement learning and weighted decoding. Specifically, we adopt the actor-critic framework to train an LM-steering critic from non-differentiable reward models. And similar to weighted decoding, our method freezes the language model and manipulates the output token distribution using called critic, improving training efficiency and stability. Evaluation of our method on three controlled generation tasks, namely topic control, sentiment control, and detoxification, shows that our approach generates more coherent and well-controlled texts than previous methods. In addition, CriticControl demonstrates superior generalization ability in zero-shot settings. Human evaluation studies also corroborate our findings.
translated by 谷歌翻译
Task-oriented dialogue (TOD) systems are mainly based on the slot-filling-based TOD (SF-TOD) framework, in which dialogues are broken down into smaller, controllable units (i.e., slots) to fulfill a specific task. A series of approaches based on this framework achieved remarkable success on various TOD benchmarks. However, we argue that the current TOD benchmarks are limited to surrogate real-world scenarios and that the current TOD models are still a long way from unraveling the scenarios. In this position paper, we first identify current status and limitations of SF-TOD systems. After that, we explore the WebTOD framework, the alternative direction for building a scalable TOD system when a web/mobile interface is available. In WebTOD, the dialogue system learns how to understand the web/mobile interface that the human agent interacts with, powered by a large-scale language model.
translated by 谷歌翻译
Recent studies have proposed a unified user modeling framework that leverages user behavior data from various applications. Most benefit from utilizing users' behavior sequences as plain texts, representing rich information in any domain or system without losing generality. Hence, a question arises: Can language modeling for user history corpus help improve recommender systems? While its versatile usability has been widely investigated in many domains, its applications to recommender systems still remain underexplored. We show that language modeling applied directly to task-specific user histories achieves excellent results on diverse recommendation tasks. Also, leveraging additional task-agnostic user histories delivers significant performance benefits. We further demonstrate that our approach can provide promising transfer learning capabilities for a broad spectrum of real-world recommender systems, even on unseen domains and services.
translated by 谷歌翻译
While witnessing the noisy intermediate-scale quantum (NISQ) era and beyond, quantum federated learning (QFL) has recently become an emerging field of study. In QFL, each quantum computer or device locally trains its quantum neural network (QNN) with trainable gates, and communicates only these gate parameters over classical channels, without costly quantum communications. Towards enabling QFL under various channel conditions, in this article we develop a depth-controllable architecture of entangled slimmable quantum neural networks (eSQNNs), and propose an entangled slimmable QFL (eSQFL) that communicates the superposition-coded parameters of eS-QNNs. Compared to the existing depth-fixed QNNs, training the depth-controllable eSQNN architecture is more challenging due to high entanglement entropy and inter-depth interference, which are mitigated by introducing entanglement controlled universal (CU) gates and an inplace fidelity distillation (IPFD) regularizer penalizing inter-depth quantum state differences, respectively. Furthermore, we optimize the superposition coding power allocation by deriving and minimizing the convergence bound of eSQFL. In an image classification task, extensive simulations corroborate the effectiveness of eSQFL in terms of prediction accuracy, fidelity, and entropy compared to Vanilla QFL as well as under different channel conditions and various data distributions.
translated by 谷歌翻译
The cone-beam computed tomography (CBCT) provides 3D volumetric imaging of a target with low radiation dose and cost compared with conventional computed tomography, and it is widely used in the detection of paranasal sinus disease. However, it lacks the sensitivity to detect soft tissue lesions owing to reconstruction constraints. Consequently, only physicians with expertise in CBCT reading can distinguish between inherent artifacts or noise and diseases, restricting the use of this imaging modality. The development of artificial intelligence (AI)-based computer-aided diagnosis methods for CBCT to overcome the shortage of experienced physicians has attracted substantial attention. However, advanced AI-based diagnosis addressing intrinsic noise in CBCT has not been devised, discouraging the practical use of AI solutions for CBCT. To address this issue, we propose an AI-based computer-aided diagnosis method using CBCT with a denoising module. This module is implemented before diagnosis to reconstruct the internal ground-truth full-dose scan corresponding to an input CBCT image and thereby improve the diagnostic performance. The external validation results for the unified diagnosis of sinus fungal ball, chronic rhinosinusitis, and normal cases show that the proposed method improves the micro-, macro-average AUC, and accuracy by 7.4, 5.6, and 9.6% (from 86.2, 87.0, and 73.4 to 93.6, 92.6, and 83.0%), respectively, compared with a baseline while improving human diagnosis accuracy by 11% (from 71.7 to 83.0%), demonstrating technical differentiation and clinical effectiveness. This pioneering study on AI-based diagnosis using CBCT indicates denoising can improve diagnostic performance and reader interpretability in images from the sinonasal area, thereby providing a new approach and direction to radiographic image reconstruction regarding the development of AI-based diagnostic solutions.
translated by 谷歌翻译